Mining of Frequent Block Preserving Outerplanar Graph Structured Patterns

نویسندگان

  • Yosuke Sasaki
  • Hitoshi Yamasaki
  • Takayoshi Shoudai
  • Tomoyuki Uchida
چکیده

An outerplanar graph is a planar graph which can be embedded in the plane in such a way that all of vertices lie on the outer boundary. Many semi-structured data like the NCI dataset having about 250,000 chemical compounds can be expressed by outerplanar graphs. In this paper, we consider a data mining problem of extracting structural features from semi-structured data like the NCI dataset. For this data mining problem, first of all, we define a new graph pattern, called a block preserving outerplanar graph pattern, as an outerplanar graph having structured variables. Then, we present an effective Apriori-like algorithm for enumerating frequent block preserving outerplanar graph patterns from semi-structured data in incremental polynomial time. Lastly, by reporting some preliminary experimental results on a subset of the NCI dataset, we evaluate the performance of our algorithms.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mining frequent subgraphs from ’easy’ classes

Recently, there is an increasing interest in mining structured data. Several frequent subgraph mining systems have been proposed. However, these usually consider general graphs. One can show that frequent subgraph mining for general graphs can not be performed in output-polynomial time. In practice however, data usually does not consist of arbitrary graphs but has a much simpler structure. In t...

متن کامل

LWA 2006 Proceedings

In recent years there has been an increased interest in frequent pattern discovery in large databases of graph structured objects. While the frequent connected subgraph mining problem for tree datasets can be solved in incremental polynomial time, it becomes intractable for arbitrary graph databases. Existing approaches have therefore resorted to various heuristic strategies and restrictions of...

متن کامل

An Efficiently Computable Graph-Based Metric for the Classification of Small Molecules

In machine learning, there has been an increased interest in metrics on structured data. The application we focus on is drug discovery. Although graphs have become very popular for the representation of molecules, a lot of operations on graphs are NP-complete. Representing the molecules as outerplanar graphs, a subclass within general graphs, and using the block-and-bridge preserving subgraph i...

متن کامل

Data sanitization in association rule mining based on impact factor

Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved ag...

متن کامل

Frequent Pattern Mining from Dense Graph Streams

As technology advances, streams of data can be produced in many applications such as social networks, sensor networks, bioinformatics, and chemical informatics. These kinds of streaming data share a property in common—namely, they can be modeled in terms of graph-structured data. Here, the data streams generated by graph data sources in these applications are graph streams. To extract implicit,...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007